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A hypothetical model of the 
virus (HIV) is proposed that 
protein HA 2 . This model is 
ture and with known propertied 
reeled mutations, peptide 
cable to a wide variety of 
weight. 



ABSTRACT 

transmembrane (TM) protein of human immunodeficiency 
is derived from the known structure of the influenza TM 
consistent with computer algorithms of predicted protein struc- 
of TM proteins determined by sequence homology, site-di- 
analbgs, immunochemistry, or other biologic means. It is appli- 
retproviral TM proteins differing widely in overall molecular 



ni ights 
hi ve 
tiese 



THE RETROVIRUSES COMPRISE a 
variety of infections. Compari! 
virus family continues to provide i 
nisms. Many retrovirus genomes 
gous genes and proteins. Most of 
generation of phylogenetic trees. 1 

Among the retrovirus gene prodjict; 
Although the env gene product is in 
surface attachment subunit (SU) and 
tion between even closely related 
has been attempted using computer 
algorithms for predicting protein 



ROVIRUSES 



INTRODUCTION 



( iverse family of viruses, each with a narrow host range but causing a 
of the structural and functional similarities among members of this 
into a number of biologic, immunologic, and pathogenic mecha- 
been sequenced in their entirety, inviting comparisons of homolo- 
studies have concentrated on highly conserved regions, with the 



:s, the envelope protein is the most diverse in size and sequence, 
all cases synthesized as a precursor that is subsequently cleaved into a 
a membrane-anchoring TM subunit, there is otherwise little conserva- 
of the virus family. 1-3 Limited comparison of functional regions 
-based methods, such as hydropathy plots, sequence homology, and 



bri nches 



str icture. 
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Beginning with the prejmise 
ture, we have decided 
benchmark features that 
is the more conserved oi 
functionally defined by 
sequence homology with 
it is possible to design a 
across the whole virus 
HA 2 of influenza virus. 10 



that functionally homologous viral proteins have a generally common struc- 
tib initially step back from computer models and begin by searching for certain 
may bring order to this protein diversity. We have chosen the TM protein since it 
the two subunits in the heterodimer. 4 Several features of HIV gp41 have been 
rhutation, studies with peptide analogs, and, in the case of the fusion peptide, by 
another virus family. 3-9 With such structural and functional benchmarks in place, 
model for the TM protein consistent with known biologic information, extendable 
fai lily, and homologous to the known structure of the well-characterized TM protein 



Sequences and computer analysis 



infectious 



Peptide sequences weri 
of DNA sequences available 
letter codes are used for 
of HIV-1, equine 
visna virus, Mason-Prize: 1 
virus (MMTV), and Roi 
obtained for the A/H3N2 

Each TM protein sequejnce 
hydropathy, by the Chou 
Pustell or Intelligenetics 
prediction of T and B cell 
ated potentials for (3 sheet 
were critically evaluated 
exceeded others. Minor 
onciled and adaptable to 
predicted structure. HoweVer 
have biased the model 
known biologic informati|Dn 
siderations. 



METHODS 



obtained either directly from published sequences or from computer translations 
through GENBANK by the IBI Pustell Sequence Analysis program. Single- 
to acids. 3 Cited retrovirus sequences include that for the HXB2-BH10 provirus 
anemia virus (EIAV), human T cell lymphotrophic virus, type 2 (HTLV-2), 
monkey virus (MPMV), feline leukemia virus (FeLV), mouse mammary tumor 
s sarcoma virus (RSV)."" 18 The peptide sequence for influenza virus is that 
Australia-Victoria 75 strain. 19 

was analyzed by the Hopp- Woods and Kyte-Doolittle algorithms {n — 6) for 
Fasman and Gamier algorithms for protein structure, available through the IBI 
PC Gene suite of programs, and by the Parker et al. and Margolit programs for 
reactive sequences. 20-26 Structural features are compatible with computer-gener- 
(P b ), a helix (P a ), extended chain (/>,), coil (P c ), and reverse turns (P,), which 
and generally followed when the probability for a given structure significantly 
variations among different retroviruses and computer algorithms were readily rec- 
a consensus structure for the virus family without substantive conflict with the 
, keeping in mind the limits of such algorithms to predict actual structure, we 
whenever the algorithm may only slightly favor one alternative structure, based on 
the overall characteristics of the influenza model, or other biochemical con- 



RESULTS 

Common structural features among retrovirus TM proteins 
seq uence < 



Examination of the 
virus family reveals that 
acteristics, as illustrated 
envelope gene product at 
terminus, from 13 to 24 
either mutation or peptidi 
lowed by a stretch of 8 - 1 
for the noncovalent bindihg 
propensity for an extendec 1 
region is marked by two 
region is highly conservejd 
comprises the principal ep itope 
we therefore term the 
a second hydrophobic sequence 
been identified as the pro! able 



of the TM proteins from all the major phylogenetic branches of the retro- 
Jhe amino-terminal 200 amino acids have a number of conserved structural char- 
For eight representative viruses in Figure 1 . After cleavage from the precursor 
a RXK/RR site, each has an extended hydrophobic region at or near the amino 
imino acids in length. 27 In HIV this region is the fusion peptide, confirmed by 
analogs that inhibit HIV-induced fusion. 5 ' 8 This region is usually closely fol- 
7 amino acids enriched in S and T, which in HIV has been implicated as critical 
of gp41 to the SU protein gpl20. 3 This in turn is followed by a region with 
a helix, according to the Chou-Fasman algorithms. The next readily identifiable 
or three vicinal cysteines beginning at position 81-90. In the oncoviruses this 
and immunosuppressive, and in the lenti viruses HIV and EIAV this peptide 
recognized by antibody to the transmembrane protein. 6 ' 7 ' 13 - 28 - 29 This region 
immunodominant region. This is followed by a conserved glycosylation site. Finally, 
beginning at position 122-152 and extending 19-27 amino acids, has 
membrane-spanning region and anchors both the SU and TM proteins of the 
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heterodimer to the viral enve lope 
the large differences in molecular 
beyond this hydrophobic segment 



Parallel structures in retrovirus and influenza virus TM proteins 



i ifluenza virus shares a number of these properties, some of which have been 
ins, after cleavage at an REKR site, with a hydrophobic amino terminus that 
peptide. 32 The next peptide region has been identified as important in the 
lment protein HA,. 33 HA 2 contains a region with vicinal cysteines immedi- 
glycosylation site. The preceding sequence has been determined to be highly 
helix some 54 amino acids long. Finally, through its membrane-spanning 
id HA,, its partner polypeptide of the heterodimer, to the viral envelope. In 
similarities, we thought it reasonable to construct a model of the retroviral TM protein 
influenza HA 2 as a scaffold. 



Structural model of the TM protein of HIV 



Such a model, with each 
protein is projected to be a 
and glycosylation regions 
emerge upon examination of 
including the hydrophobic 
hydrophobic core of signal 
T (5 of 11) and interactive 
comprised of the vicinal 
extended glycosylated region, 
interactive with gpl20. 5-7,29 
which probably forms an a 
membrane-spanning region is 
bilayer. 

Certain features emerge 
turn is indicated just where 
cysteines. All glycosylation 
light of the precedent of the i 
by computer analyses alone 
the lipid bilayer, where it 
findings indicating that this 
amino-terminal hydrophobic 
another lentivirus, EIAV, 
shown as a loosely coiled 
Computer models indicate a P 
algorithms of such a strongly 
tight hairpin structure. The 
chemical evidence, even 
ment) as principal B or T cell 



fibrous : 
being 



though 



Fibrous amphipathic core of retrovirus TM proteins 



A central structure of the 
of the molecule. Helical net 



30 Beyond this region retroviruses are widely disparate, accounting for 
weights of the TM proteins, with from 20 to 226 additional amino acids 



core 



acid identified, is projected for HIV-1 gp41 in Figure 2. Overall, the TM 
structure with a significant degree of sidedness, all the immunodominant 
located at the apex and left side of the protein. Several specific features 
this model. The laterally extended amino terminus bears the fusion peptide, 
LFLGFL, with potential as a membrane insertional hairpin similar to the 
sequences. 34 - 35 This is followed by an internally looping structure rich in S and 
gpl20 and an ascending extended helix of 38 amino acids. 3 The apex is 
cysteines and the immunodominant region of the protein and is succeeded by an 
with localized propensity for a helix and a bend rich in S and T (4 of 7) also 
fext is an extraordinarily highly charged region (12 of 13 are E, K, Q, or N), 
helix stabilized by multiple "ion pairs." 36 Finally, the highly hydrophobic 
depicted as an a helix with a sharp turn just after emerging from the lipid 

naturally from the computer analyses. At the apex of the polypeptide, a strong 
Appropriate to allow internal disulfide bond formation between the vicinal 
are at predicted turns or extended chains. Other features are proposed in 
influenza virus structure or existing biologic information. The fusion peptide 
been predicted to span the membrane. 37 It is shown here to be external to 
freely interact with its presumed cellular target, consistent with recent 
hydrophobic sequence lacks a stop transfer signal. 38 The accessibility of the 
" le is also supported by the observation that 75% of horses infected with 
produce antibodies reactive with the corresponding region of EIAV gp45. 29 It is 
str|ucture, similar to the known fusion peptide structure of influenza HA 2 . 10 
structure for this region in most retroviruses, consistent with Chou-Fasman 
hydrophobic segment, but no strong turns are predicted that would produce a 
immunodominant region is so designated because of preponderant immuno- 
computer analyses project other regions (such as the highly charged seg- 
spitopes. 6 - 7 ' 28 - 29 



model is the extended a helix proposed just before the immunodominant region 
analysis of this region, shown in Figure 3, indicates that it has a propensity to 
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IMMUNODOMINANT 
REGION 




FIG. 2. Model of gp41 (TM) protein of H1V-1 
proposed structure derived from computer modeling 
modified helical nets alternating three and four a 
acids are indicated as solid circles, charged amino 
Nonhelical regions are shown as loosely coiled expended 
lecular disulfide bond by a double line; potential 



FUSION 
PEPTIDE 



INTRACELLULAR 



A linear sequence of gp41 is shown in a planar projection of the 
and based on the influenza HA 2 scaffold, a Helices are depicted as 
ino acids per turn connected by single lines. Hydrophobic amino 
acids as open circles, and neutral amino acids as partly filled circles. 

chains; strong turns are indicated by T; the proposed intramo- 
ilycosylation sites by stick figures. 
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MPMV 



FIG. 3. Helical net analysis 
(Q427-N464), HIV-1 (N553- 
jections, with a constant 



HIV 



FeLV 



MMTV 



of retroviral TM protein segments. Similar regions of the TM proteins of MPMV 
Q590), FeLV (R469-N503), and MMTV (T490-L527) are shown as helical net pro- 
hydrcjphobic environment highlighted. 39 



be amphipathic, with a hydrophobic moment over 11 helical turns of 0.35. This propensity is even more 
strongly indicated in other retroviruses and is common to the entire virus family. Indeed, over 25 residues 
and eight helical turns in Fe ^V, all 10 of the hydrophobic amino acids lie at 0- 160° on a helical wheel, but 
seven of the eight charged r ;sidues lie at 180-300°. A specific conserved feature in oncoviruses of groups 
B, C, and D, as well as lenti viruses, is the hydrophobic environment highlighted in Figure 3, which is 
maintained despite a great cegree of sequence diversity among these viruses. 

Model of conserved structures among retrovirus TM proteins 
the 



ariety 



The general features of 
the structure to a wide v: 
Fasman secondary structure 
proteins. The data are 
influenza HA 2 , in which thi 
extended helical segment, P, 
quite similar, when other 
Chou-Fasman prediction 
idues characteristic of ampri: 
peptide segments within the 
this score is a measure of the 
nature for this region. Even 
helix formation following 
of a helix breaker tetramer 

Also shown in Table 1 
preceding the extended heliijc 
predictive of P turns. In 
the vicinal cysteines are 



model are equally valid for other retroviruses, with minor variations to adapt 
of disparate sequences. Shown in Table 1 are the potentials from Chou- 
predictions for the proposed conserved structural features among these TM 
consistent with the proposed consensus structure and comparable to the potentials for 
extramembranal structures have been established by crystallography. For the 
exceeds P b except in one instance. Although some a and (3 potentials appear 
factors are considered the propensity for a helix is significantly increased. The 
method does not take into account the periodic occurrence of hydrophobic res- 
ipathic helices but scores all hydrophobic residues as strong 0 formers. Since 
extended helix have a significant amphipathic score of 20-40 and given that 
amphipathic helical character of the segment, this supports proposal of helical 
though there are short segments with p potential, the likelihood of continuing 
nucleation is high, and the probability of the helix abruptly halting in the absence 
very low. 

: the potentials for the proposed conserved rums. For the turn immediately 
, P, varies from 0.990 to 1.325; values above 1.000 are frequently considered 
addition, the turn potentials of the conserved turn within one to four amino acids of 
listjed, with P, values of 1.050-1.298. The potentials of the proposed conserved 
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Table 1 . Computer-Generated Chou-Fasman Potentials for Conserved Helix and Turns in TM Proteins 



Turn preceding helix 



Extended 



helix 



Virus No. aa aa 



Pt No. aa Pa 



Vicinal 
cys, no. 



Posthelix turn 



Turn inside membrane 



Pb 



No. aa 



aa 



Pt No. aa 



Pt 



HIV-1 

EIAV 

Visna 

HTLV-2 

MPMV 

FeLV 

MMTV 

RSV 

FLU-HA, 



39- 42 QQNN 
46-49 NSTL 
28-31 QQSY 
30-33 SSKS 
18-21 STGT 
24-27 GTGT 

40- 43 HRNV 
17-20 GPTA 
70-73 FSEV 



1.270 43-80 

1.135 50-84 

1.133 32-86 

1.325 34-65 

1.153 32-65 

1.260 28-66 

0.990 44-82 

1.175 21-65 

0.935 76-130 



1.16(2 1.146 

1.139 1.046 

1.155 1.061 
1.095 1.091 
1.066 1.077 
1.161 0.965 

1.156 0.999 
1.091 1.034 



87, 83 
91, 99 
90, 96 
81, 88 
81, 88 
86, 93 
86, 94 
90, 97 



87-90 CSGK 
100-104 HTGH 
87-90 DCWH 



76-79 
76-79 
81-84 
89-92 
95-98 



EQGG 
EQGG 
QEGG 
NRDF 
GMCC 



1.11 



953 136, 142 133-136 GNGC 



1.298 
1.105 
1.050 
1.210 
1.210 
1.210 
1.190 
1.135 
1.468 



197-200 
193-196 
202-205 
158-161 
154-157 
172-175 
200-203 
172-175 
212-215 



RQGY 

TSSP 

QAYK 

LPQR 

NKLM 

DRIS 

VQSD 

VSSS 

QRGN 



1.158 
1.203 
0.948 
0.870 
1.138 
0.995 
1.090 
1.198 
1.262 



turn immediately inside the lipid bilayer vary most widely, with values of 0.870-1.203, but several exceed 
1.10, including strong turn propensities for HIV, as depicted in Figure 2, and for EIAV. 

Schematic projections are shown for sev0ral representative retroviruses in comparison with HA 2 of influ- 
enza virus in Figure 4. In a much smaller TM protein, such as gpl5 of HTLV-2 or of MPMV, both the 
ascending helix and the descending coil are truncated relative to HIV, but the overall structural features are 
consistent with the model. In other viruses^ the ascending helix may be broken into multiple helical seg- 
ments. The most variable region is the segment "descending" from the immunodominant apex to the site 
of membrane insertion. The number of glydosylation sites, position and length of helix, and the number or 
position of turns all vary significantly. The principal constants of the model throughout the virus family are 




FIG. 4. Schematic predicted structures of TM proteins of retroviruses and influenza virus. Solid cylinder, mem- 
brane-spanning helix; cross-hatched cylinder, extjramembranous a helix; directional arrows, p sheet; solid line, fusion 
peptide; broken line, extended chain or random coil. Influenza HA, structure is modified from that previously pub- 
lished. 10 
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the fusion peptide, the proposed amphipathic ascending helix, the immunodominant region, the strong turn 
just interior to the lipid bite^er, and the extramembranal mass of the protein before the membrane-spanning 
region. 

DISCUSSION 

Although any model proposed in the absence of x-ray crystallographic data is of its very nature specula- 
tive, it is useful in exploring the potential similarity of these viruses to other families of enveloped viruses 
and in delineating regions that may serve common functions across the retrovirus family. A prime example 
is the region we propose as the extended, amphipathic, ot helical fiber. Its potential importance is under- 
scored by an HIV mutant (A 582 to T) that abrogates antibody neutralization of HIV at distant epitopes, 
presumably by conformatioijial changes in the heterodimer, even as it decreases the helical potential of this 
region. 39 Such helical fibers; have been identified as the structural backbone in HA 2 of influenza virus, the 
peplomeric protein of corortaviruses, and the hemagglutinin of reovirus. 10 - 40,41 The glycoprotein hetero- 
dimers of retroviruses form multimers, possibly by hydrophobic bonding along such fibers to form coiled 
coil structures. 42 ' 43 Mutatioh in this structure may disrupt the organization of the virion and its antigenic 
properties. Amphipathic helices are also recognized as key structural features for antigen interaction with T 
cells, particularly when in cjose proximity to epitopes recognized by B cells. Thus, the model predicts this 
region to be important in T £ell recognition of HIV and other retroviruses, although other helical segments 
are predicted to have such botential and may also play a role. Such possibilities are strengthened by the 
finding that in the oncoviruses the peptide region just downstream from the proposed helical region also has 
potential for interaction witti T cells, bearing similarity to interleukin-2 and being immunosuppresive. 13,28 

The model thus provides a structural context in which unique regions of the TM protein can be identified 
and directly compared acrcjss the entire virus family, predictions made concerning function, and these 
predictions tested by site-dujected mutations. 
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